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ABSTRACT 

This paper presents an invariant under scaling and lin- 
ear brightness change. The invariant is based on dif- 
ferentials and therefore is a local feature. Rotation- 
ally invariant 2-d differential Gaussian operators up to 
third order are proposed for the implementation of the 
invariant. The performance is analyzed by simulating 
a camera zoom-out. 

1. INTRODUCTION 

In image retrieval systems, efficiency depends promi- 
nently on identifying features that are invariant under 
the transformations that may occur, since such invari- 
ants reduce the search space. Such transformations 
typically include translation, rotation, and scaling, as 
well as linear brightness changes. Ideally, we would 
like to consistently identify key points whose features 
are invariant under those transformations. 

Geometrical invariants have been known for a long 
time, and they have been applied more recently to vi- 
sion tasks [0, ^) . Differential invariants are of particular 
interest since they are local features and therefore more 
robust in the face of occlusion. Building on a sugges- 
tion by Schmid and Mohr we propose an invariant 
with respect to the four aforementioned transforma- 
tions, based on derivatives of Gaussians. 

In the following, we restrict ourselves to 2-d ob- 
jects, i.e. we assume that the objects of interest are 
not rotated outside the image plane, and that they are 
without significant depth so that the whole object is 
in focus simultaneously. The lighting geometry also 
remains constant. In other words, we allow only trans- 
lation and rotation in the image plane, scaling that 
reduces the size of an object (zoom-out), and bright- 
ness changes by a constant factor. The zooming can 
be achieved by either changing the distance between 
object and camera or by changing the focal length. 

2. THE INVARIANT 



2.1. The 1-d case 

Schmid and Mohr || have presented the following in- 
variant under scaling. Let f(x) = g(u(x)) — g(ax), 
i.e. g(u) is derived from f(x) by a change of vari- 
able with scaling factor a. Then f(x) = g(u),f'(x) = 
ag'(u)J"(x) = a 2 g"{u), and thus 8 12 = f(x) 2 /f'(x) 
is an invariant to scale change. This invariant general- 
izes to 
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where k, n 6 N denote the order of the derivatives. 

OgM is not invariant under linear brightness change. 
But such an invariance is desirable because it would 
ensure that properties that can be expressed in terms 
of the level curves of the signal are invariant ||. A 
straightforward modification of 812 gives us the ex- 
tended invariance: Let f{x) — kg[u) = kg(ax) where 
k is the brightness factor. Then f'(x) = ka g'(u), f"{x) 
— ka 2 g"(u), f"'(x) = ka 3 g"'(u). It can be seen that 



e ia3 (/(a;)) = 
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f"( x y 



(ka 2 g"(u)¥ 



has the desired invariance since both a and k cancel 
out. 6123 can be generalized to 6 g3 = f^(x) f^ k+2 \x)/ 
/( fc+1 )(x) 2 where k G N, but k > 1 is of little interest 
in computer vision. 

An obvious shortcoming of 8123, as well as of the 
other scale invariants discussed so far, is that they are 
undefined where the denominator is zero. Therefore, 
we modify 8123 to be continuous everywhere: 
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(f(x)f"(x))/f"(xf 
f"(xf/(f'(x)f"'(x)) 



if cl 
if c2 
else 



(2) 



where cl is the condition f"(x) = A f(x)f"(x) = 0, 
and c2 specifies |/'(a;)/ w (z)| < \f"(x) 2 \. Note that 
this definition results in — 1 < 8 m i23 < 1. 



2.2. The 2-d case 



By contrast, Schmid and Mohr use 



If we are to apply eq. g to images, we have to gener- 
alize the formula to two dimensions. Also, images are 
given typically in the form of sampled intensity val- 
ues rather than in the form of closed formulas where 
derivatives can be computed analytically. One way to 
combine filtering with the computation of derivatives 
can be provided by using Gaussian derivatives [Q, ||, || ■ 
Let Ii(x, y) and l2(u, v) = l2(ax, ay) be two images re- 
lated by a scaling factor a. Then, according to Schmid 
and Mohr [§, 



v[6](x,y;a) 



f-oo h(x,y) ®G il .. An (x,y;a)dxdy = 



(3) 



where the Gi 1 ...i n are partial derivatives of the 2-d Gaus- 
sian. 

Rotational invariance is a highly desirable property 
in most image retrieval tasks. While derivatives are 
translation invariant, the partial derivatives in cq. || 
are not rotationally invariant. However, there are some 
well-known rotationally invariant differential operators. 
Recall that the 2-d zero mean Gaussian is defined as 

G(x,y;a) = — ^ e~^~ (4) 

Then the gradient magnitude 



grad G(x,y;a)| = JG> + G* = v^T?/^ G (5) 



is a first order differential operator with the desired 
property. Horn || gives the following second order op- 
erators: 

Gyy ,p\ 
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LoG(x,y;cr) = G x 

= {x 2 

QV(x,y;a) - ^ G 2 XX + 2 G 2 xy + G 2 y 
= ^/{x 2 - cr 2 ) 2 + 2x 2 y 2 + (V 2 - o- 2 ) 2 /a 4 G 



(7) 



where LoG is the Laplacian of Gaussian. QV stands for 
Quadratic Variation where we have taken the square 
root in order to avoid high powers. Schmid and Mohr 
also suggest what they call v[2\: 

^[2] {x,y',o~) = G XX G 2 + 2 G xy G x G y + G vy G 2 

= ((x 4 - a 2 x 2 ) + 2x 2 y 2 + (y 4 - a 2 y 2 ))/a 8 G 3 

(8) 

Analogous to QV, we define a third order differen- 
tial operator which we call Cubic Variation to be 
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and 



u[8](x, y; a) = G XXX G\ + 3 G xxy G 2 x G y 

-j - 3 G X yyG X Gy ~\~ GyyyGy 

= {{3a 2 x-x 3 )x 3 + ...)/<j 12 G 4 
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(11) 

These operators are shown in fig. g for a = 3. Given 
this choice of operators, the criteria on which we select 
the operators are as follows: 

• The operator must fulfill eq. |^, i.e. scaling by a 
(of both x and y simultaneously) should return a 
factor of a™, where n is the order of the operator, 
so that eq. || is indeed a scale invariant. 

• The operators used to compute 8 m i23 should dif- 
fer in shape as much as possible from each other 
in order to deliver more discriminative results. 

With respect to the first criterion, the gradient returns 
a factor a , as required for a first order differential op- 
erator, and the LoG and QV return a factor of a 2 , but 
v[2] returns or. This cannot be remedied by taking 
the square root since v\2\ is negative at some points. 
As for the third order operators, CV returns a 3 , while 
v[&\ and v[&] return a e . We can take the square root of 
v[6] but not of f[8]. Where the second criterion is con- 
cerned, the LoG is preferable to QV since the LoG has 
both positive and negative coefficients, which makes it 
unique compared to all other operators. It is not ob- 
vious whether CV or \/ has more discriminatory 
power, the difference between them seems negligible. 

v[%] has a slightly more compact support, but the co- 
efficients are an order of magnitude smaller than those 
of CV and the other operators. Fig. || shows cross sec- 
tions through the center of some of the operators in 
fig. |^. In our experiments, we used the Gradient, LoG, 
and CV as the differential operators to compute O m i23 
according to eq. |^. Since Gradient and CV are always 
positive or zero, we have < S m i23 < I • 

Note that eqs. || to [n] suggest two ways of im- 
plementation. Either, kernels representing the partial 
derivatives of the Gaussian can be used, and the op- 
erators are assembled from those kernels according to 
the left hand sides of the equations, or a characteris- 
tic filter is designed in each case according to the right 
hand sides of the equations. 



3. SIMULATION 

The infinite integrals in eq. [5] can only be approximated 
by sampled, finite signals and filters. Furthermore, 
in cameras, where the number of pixels is constant, 
a world object is mapped into fewer pixels as the cam- 
era zooms out, leading to increasing spatial integra- 
tion over the object and ultimately to aliasing. This 
means that the computation of G m i23 necessarily has 
an error. Equation [| suggests a way to analyze the ac- 
curacy of 8 m i23 by simulating the zoom-out process. 
The left hand side can be thought of as a scaling by 
filtering (SF) process while the right hand side could 
be called scaling by optical zooming (SO) where we deal 
with a scaled, i.e. reduced image and an appropriately 
adjusted Gaussian operator. Here, scaling by optical 
zooming serves to simulate the imaging process as the 
camera moves away from an object. The two processes 
are schematically depicted in fig. [j]. 

The input to the simulation are 8-bit images taken 
by a real camera. The scaling factor a is a free param- 
eter, but it is chosen such that the downsampling maps 
an integer number of pixels into integers. Both SF and 
SO start off with a lowpass filtering step. This pre- 
filtcring, implemented by a Gaussian with a = a, im- 
proves the robustness of the computations significantly 
as derivatives are sensitive to noise. Also, lowpass fil- 
tering reduces aliasing at the subsequent downsampling 
step. 

In SO, if the image function had a known analyt- 
ical form, we could do the scaling by replacing the 
spatial variables x and y by ax and ay. But images 
are typically given as intensity matrices. Therefore, 
the downscaling is done by interpolation, using cubic 
splinesf). We then apply the differential operators (Gra- 
dient, LoG, CV) with the appropriate value of aa to the 
image and compute the invariant 6^123 ■ By contrast, 
in SF, the operators are applied to the original size 
image. The invariants are computed and then down- 
scaled, using again cubic spline interpolation, to the 
same size as the image coming out of the SO process, 
so that 6^123 an d ©^123 can be compared directly at 
each pixel. 

4. EXPERIMENTS 

Fig. ^ demonstrates the simulation process on a real 
image. The original image, 256x256, in the top row, is 
downscaled to 100x100, i.e. by a factor a — 2.56. The 
second and third row show O m i23 as the results of SF 
and SO, respectively, at all pixel locations. Fig. || shows 
the absolute difference between 0^ 2 3 an d ©^123; where 

lr The Matlab function spline() is employed 
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Figure 1: (left) scaling by filtering process 
vs. (right) scaling by simulated optical zooming. 

the four boundary rows and columns have been set to 
zero in order to mask the boundary effects. Note that 
the difference is roughly a factor 100 smaller than the 
values of 9^ 23 or Q? n F 123 . 

In order to quantify the error, we have varied a from 
1 to 2.56, sampled such that the downscaled image has 
an integer pixel size, and computed the global absolute 
differences 



max (16^23-9^23 1 



(12) 



where i,j range over all non-boundary pixels of the 
respective image, as well as the global relative error, in 
percent: 



e gr (a) = 100- 



Ae ml 



23 



^j(|e*S»|) (13) 

The graphs of these measures are shown in fig. [?]. We 
see that the global relative error is less than 1.3% even 
as a becomes as large as 2.56. For the set of images we 
worked with, we have observed 0.5% < max(e gr ) < 2% 
for a < 3. We also noted that without prefiltering e gr 
can become as large as 10%. 

Fig. |^ shows the relative error per pixel in percent, 
i.e. 100Ae ml23 (i,i)/|e^ 23 (i,i)|, but only for those 
pixels (i, j) where @mi23(* 5 j) is larger than the average 
value of 6^23- We find that error to be less than 3.5% 
anywhere in the given example. 



5. OUTLOOK 

In the context of image retrieval, there remain some 
major issues to be addressed. First, the performance 
of the proposed invariant has to be analyzed on se- 
quences of images taken at increasing object-camera 
distance, i.e. the simulation of the zoom-out process 
has to be replaced with a true camera zoom-out. Intrin- 
sic limitations of the image formation process by CCD 
cameras [|| can be expected to somewhat decrease the 
accuracy of the invariant. 

Second, a scheme for keypoint extraction must be 
devised. For efficiency reasons, matching will be done 
on those keypoints only. Ideally, the keypoints should 
be reliably identifiable, irrespective of scale. 

Third, the proposed invariant must be combined 
with a scale selection scheme. Note that in the simula- 
tion above, we knew a priori the right values for a and 
therefore for ao~ and the corresponding filter size. But 
such is not the case in general object retrieval tasks. 
Selecting stable scales is an active research area || || . 

6. ACKNOWLEDGEMENTS 

The author would like to thank Bob Woodham and 
David Lowe for their valuable feedback. 



7. REFERENCES 

[1] B. ter Haar Romeny, "Geometry-Driven Diffusion 
in Computer Vision" , Kluwer 1994. 

[2] G. Hoist, "Sampling, Aliasing, and Data Fidelity", 
JCD Publishing & SPIE Press, 1998. 

[3] B. Horn, "Robot Vision", MIT Press, 1986. 

[4] A. Jain, "Fundamentals of Digital Image Process- 
ing" , Prentice-Hall, 1989. 

[5] T. Lindeberg, "Scale-Space Theory: A Basic Tool 
for Analysing Structures at Different Scales", J. of 
Applied Statistics, Vol.21, No.2, pp.223-261, 1994. 

[6] D. Lowe, "Object Recognition from Local Scale- 
Invariant Features", ICCV, Kerkyra 1999. 

[7] J. Mundy, A. Zisserman, "Geometric Invariance in 
Computer Vision", MIT Press, Cambridge 1992. 

[8] C. Schmid, R. Mohr, "Local grayvalue invariants 
for Image Retrieval", IEEE Trans. PAMI, Vol.19, 
No.5, pp.530-535, May 1997. 

[9] I. Weiss, "Geometric Invariants and Object Recog- 
nition", Int. Journal of Computer Vision, Vol.10, 
No.3, pp.207-231, 1993. 




Figure 2: Rotationally invariant 2-d differential 
operators, a — 3.0: (a) Gaussian (b) Gradient 
(c) Quadratic Variation (d) u[2] (e) Laplacian of 
Gaussian (f) Cubic Variation (g) — (h) i/[8]. 




Figure 3: Cross sections through differential operators, 
a = 3.0: (a) Gradient (b) Laplacian of Gaussian 
(c) Quadratic Variation (d) Cubic Variation (e) \J v[<5\. 
The circles mark the filter coefficients. 




10oL 1 1^1 1 L 1 1 1 1 1 J 

10 20 30 40 50 60 70 80 90 100 




10 20 30 40 50 60 70 80 90 100 

Figure 4: (a) Original 256x256 image (b) 0^123 for 
a = 2.56 (c) 8^23 for a = 2.56. 
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Figure 5: Absolute difference |0^ 23 - 0^,123 1- Note 
the smaller scale (10~ 4 ) of the error compared to fig. ||. 
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Figure 6: Relative Errors at above-average values of 
0m?23 f° r a — 2-56, in percent. 




Figure 7: (a) A 6ml23 (a), (b) maxy (\Q^Sas\)(a), 
(c) e gr (a), over image size in pixels; 1 < a < 2.56. 



